Skip to content

Conversation

@headius
Copy link
Contributor

@headius headius commented Dec 10, 2024

The shutdown process here attempted to terminate the test server by interrupting it with Thread#kill, and then proceeded to close the server and join the thread. The kill does indeed interrupt the accept call, but the close call could also interrupt the thread as part of notifying blocked threads waiting on that socket call.

In JRuby, where all of this can happen at the same time, it leads to the following scenario:

  • The server thread enters TCPServer#accept and blocks.
  • The main thread calls Thread#kill to interrupt the accept call.
  • The server thread wakes up and starts to propagate the kill. There is a slight delay between this wakeup and removing the server thread from the TCPServer's blocked threads list.
  • The main thread calls TCPServer#close, which sees that the server thread is still in the blocked list, so it initiates a second interrupt to raise IOError "closed in another thread" on the server thread.
  • As the kill is bubbling out, another check for interrupts occurs, causing it to see the new raise interrupt and propagate that instead of the active kill.
  • Because the server is now closed and the rescue here is empty, the server loop will endlessly attempt and fail to call accept.

I was unable to determine how CRuby avoids this race. There may be code that prevents an active kill interrupt from triggering further interrupts.

In order to get these tests running on JRuby, I've made the following changes:

  • Only kill the thread; one interrupt is sufficient to break it out of the accept call.
  • Ensure outside the server loop that the server gets closed. This happens within the server thread, so triggers no new interrupts.
  • Minor cleanup for the pattern of using @ssl_server or @server.

This change avoids the race in JRuby (and possibly other parallel- threaded implementations) and does not impact the behavior of the tests.

The shutdown process here attempted to terminate the test server
by interrupting it with Thread#kill, and then proceeded to close
the server and join the thread. The kill does indeed interrupt
the accept call, but the close call could also interrupt the
thread as part of notifying blocked threads waiting on that
socket call.

In JRuby, where all of this can happen at the same time, it leads
to the following scenario:

* The server thread enters TCPServer#accept and blocks.
* The main thread calls Thread#kill to interrupt the accept call.
* The server thread wakes up and starts to propagate the kill.
  There is a slight delay between this wakeup and removing the
  server thread from the TCPServer's blocked threads list.
* The main thread calls TCPServer#close, which sees that the server
  thread is still in the blocked list, so it initiates a second
  interrupt to raise IOError "closed in another thread" on the
  server thread.
* As the kill is bubbling out, another check for interrupts occurs,
  causing it to see the new raise interrupt and propagate that
  instead of the active kill.
* Because the server is now closed and the rescue here is empty,
  the server loop will endlessly attempt and fail to call accept.

I was unable to determine how CRuby avoids this race. There may be
code that prevents an active kill interrupt from triggering
further interrupts.

In order to get these tests running on JRuby, I've made the
following changes:

* Only kill the thread; one interrupt is sufficient to break it
  out of the accept call.
* Ensure outside the server loop that the server gets closed. This
  happens within the server thread, so triggers no new interrupts.
* Minor cleanup for the pattern of using @ssl_server or @server.

This change avoids the race in JRuby (and possibly other parallel-
threaded implementations) and does not impact the behavior of the
tests.
@headius headius requested review from hsbt and nobu December 10, 2024 05:49
@headius
Copy link
Contributor Author

headius commented Dec 10, 2024

@hsbt @nobu @ko1 I would appreciate some guidance on the interrupt questions here. Does CRuby have this race when multiple interrupts arrive in rapid succession? If not, how does it avoid an active "kill" interrupt getting overtaken by a second "raise" interrupt happening in parallel?

headius added a commit to headius/jruby that referenced this pull request Dec 10, 2024
Copied from my PR to fix this double-interrupt race.

See ruby/net-http#197
@headius
Copy link
Contributor Author

headius commented Dec 10, 2024

I have described the issue in more detail, with a simulated case that also triggers on CRuby, in this JRuby bug: jruby/jruby#8501

I think my main question here is: should a thread that is dying due to a "kill" be able to receive "raise" interrupts? And vice-versa?

Whether JRuby has an actual bug or not (perhaps we should try to make sure the killed thread does not receive the IOError raise?) I want to know the answer to these questions.

headius added a commit to headius/jruby that referenced this pull request Dec 27, 2024
This is the same situation as ruby/net-http#197, where a server
thread is interrupted twice and ends up in a loop. This is more
fallout from removing WEBrick and replacing its use in tests with
small hand-written server code that apparently exposes a behavior
difference in JRuby.
@headius
Copy link
Contributor Author

headius commented Dec 27, 2024

@hsbt I ran into another example of this "double interrupt" test server code in open-uri's test utils: jruby/jruby@a355cd7

I'm not convinced yet that there's a bug in JRuby, but in any case, the new code is reliable on both CRuby and JRuby and allows the tests to complete in our CI. Hopefully we can have a chat about these changes soon.

@hsbt hsbt merged commit e18aa61 into ruby:master Dec 31, 2024
22 checks passed
@ioquatix
Copy link
Member

Why not server = @ssl_server || @server and be done with all the conditionals later on?

@headius
Copy link
Contributor Author

headius commented Jan 20, 2025

@ioquatix also good!

@headius headius deleted the no_double_interrupt_test_server branch January 20, 2025 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants